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Abstract — We consider a pair of correlated processes 
{Zn}^=-oo and {S„}^^^aoi where the former is observable 
and the later is hidden. The uncertainty in the estimation of 
Zn upon its finite past history Z^^^ is H{Z„\Zq~^), and for 
estimation of Sn upon this observation is H{Sn\ZQ~''), which 
are both sequences of n. The limits of these sequences (and their 
existence) are of practical interests. The first limit, if exists, is 
the entropy rate. We call the second limit the estimation entropy. 
An example of a process jointly correlated to another one is the 
hidden Markov process. It is the memoryless observation of the 
Markov state process where state transitions are independent of 
past observations. We consider a new representation of hidden 
Markov process using iterated function system. In this represen- 
tation the state transitions are deterministically related to the 
process. By this representation we analyze the two dynamical 
entropies for this process, which results in integral expressions 
for the limits. This analysis shows that under mild conditions 
the limits exist and provides a simple method for calculating the 
elements of the corresponding sequences. 

Index Terms — entropy rate, hidden Markov process, iterated 
function system, estimation entropy. 



I. Introduction 

A stochastic process which is a noisy observation of a 
Markov process through a memoryless channel is called a hid- 
den Markov process (HMP). In many applications of stochastic 
signal processing such as radar and speech processing, the 
output of the information source can be considered as an 
HMP. The entropy rate of HMP as the limit of compressibility 
of information source thus have special interest in those 
applications. Moreover, in the additive noise channels the noise 
process can be characterized as a hidden Markov process and 
its entropy rate is the defining factor in the capacity of channel. 
Finding the entropy rate of the hidden Markov process is 
thereby motivated by both applications in stochastic signal 
processing, source coding and channel capacity computation. 

The study of the entropy rate of HMP started in 1957 by 
Blackwell [1] who obtained an integral expression for the 
entropy rate. This expression is defined through a measure 
described by an integral equation which is hard to extract from 
the equation in any explicit way. Bounds on the entropy rate 
can be computed based on the conditional entropies on sets 
of finite number of random variables [2]. Recent approaches 
for calculating the entropy rate are Monte Carlo simulation 
[3] and Lyapunov exponent [3], [4]. However these approaches 
yield indeterministic and hard to evaluate expressions. Simple 
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expression for the entropy rate has been recently obtained for 
special cases where the parameters of hidden Markov source 
approach zero [4], [5]. 

The hidden Markov process is a process defined through 
its stochastic relation to another process. The entropy rate 
of HMP thus corresponds to this relation and the dynamic 
of the underlying process. However this entropy rate only 
indicates the residual uncertainty in the symbol one step ahead 
of observation of the process itself. It doesn't indicate our 
uncertainty about the underlying process. In this paper we 
define estimation entropy as a variation of entropy rate to 
indicate this uncertainty. In general for a pair of correlated 
processes which one of them is hidden and the other is 
observable we can define estimation entropy as the long run 
per symbol uncertainty in the estimation of the hidden process 
based on the past observation. Such an entropy measure will 
be an important criterion for evaluating the performance of 
an estimator. In this paper we jointly analyze the entropy 
rate and estimation entropy for a hidden Markov process. 
This analysis is based on a mathematical model, namely the 
iterated function system [6], which suits the dynamics of the 
information state process of the HMP. This analysis results 
in integral expressions for these two dynamical entropies. 
We also derive a numerical method for iteratively calculating 
entropy rate and estimation entropy for HMP. 

In this paper a discrete random variable is denoted by 
upper case and its realization by lower case. A sequence 
of random variables Xq, Xi, X2, ...X„ is denoted by Xq, 
whereas X" refers to ^"oo- The probability Pr{X — x) 
is shown by p{x) (similarly for conditional probabilities), 
whereas p{X) represents a row vector as the distribution of 
X, ie: the k-th element of the vector p{X) is Pr{X — k). 
For a random variable X defined on a set X, we denote by 
Va' the probability simplex in RI'^L A specific elements of a 
vector or matrix is referred to by its index in square brackets 
or as a subscript. The z-th row of matrix A is represented 
by A'-^\ The entropy of a random variable X is denoted 
by H{X) whereas h : V x — > represents the entropy 
function over V x, i-e: h{p{X)) — H{X) for all possible 
random variables X on X. Our notation does not distinguish 
differential entropies from ordinary entropies. 

In the next section we define the iterated function system 
and draw some results from [6], as well as a new result. In 
section III we define the hidden Markov process by identifying 
the key properties for the probability distributions on the 
corresponding domain sets and show that such a process can 
be represented by an iterated function system . In sections 
IV and V we derive integral expressions for entropy rate and 
estimation entropy followed by a method for calculating them. 
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II. Iterated Function Systems 

Consider a system with a state in the space of A, where 
the state transitions depends deterministically to a correlated 
process taking values in a set X,„ = {l,2,...,m} and 
stochastically depending on the state. The mathematical model 
representing such a system is an iterated function system (IFS) 
which is defined by m functions transforming a metric space 
A to itself, and ni place dependent probabilities. 

Definition 1: A triple T — {A, Fi,qi)i=i^2...,K is an iter- 
ated function system if : A ^ A and qi : A ^ TZ^ are 
measurable functions and J2i li ^ 1- 

The IFS represents the above mentioned dynamical system 
where the probability of event i e T„i under state a; S A is 
Qi (x) and the consequence of such event is the change of state 
to Fi{x). 

Although the generality of IFS allows the functions of 
Fi and qi to be measurable which is a wide range of real 
functions, in this paper we are only interested in a subset of 
those functions, the continuous functions. Such systems are 
referred to as continuous IFS. If the functions F^'s are only 
defined on A^ C A, where Ai — {x € A : qi{x) > 0}, 
then the IFS is called partial iterated function system (PIFS). 
Although the general application of IFS in this paper could 
be involved PIFS, we avoid such complexity by restricting the 
application. 

Consider AA'^{A) as the space of probability measures on 
A. For an T we define an operator $ : M^{A) M^{A), 



Hfi){B) lB{Fdx))q,{x)ti{dx), 



(1) 



for /i G M^{A) and _B C A. The operator $, induced 
by JT, represents the evolution of probability measures under 
the action of JF. More specifically, if our belief on the state 
of system at time n is the probability measure /i„, (^„ G 
M.^{A)), then this behef at time n + 1 is 



(2) 



which can be easily verified by Equation ([T) and role of 
functions Fi and qi. Note that the operator $ is deterministic 
and it is affine, i.e: <i>(A/ii + (1 - A)^2) = A$^i + (1 - A)$^2- 
By such representation $ is a so called Markov operator. 

For a Markov operator $ acting on the space ^^^(A) a 
measure /i G M^{A) is invariant if ji — (E>/i, and it is 
attractive if 

lim $'V, (3) 

n — ^oo 

for any v G AA^{A). A Markov operator $ (and the cor- 
responding IFS) is called asymptotically stable if it admits 
an invariant and attractive measure. The concept of limit in 
Equations (|3j is convergence in weak topology, meaning 



lim 

n — >oo 



(4) 



for any continuous bounded function /. Note that the limit 
doesn't necessarily exist or it is not necessarily unique. The 
set of all attractive measures of $ for T is denoted by 5'^. 

A Markov operator which is continuous in weak topology 
is a Feller operator We can show that for a continuous IFS 



the operator $ is a Feller operator. In this case any p G S''^ 
is invariant. 

Let B{A) be the space of all real valued continuous 
bounded functions on A. A special property of a Feller 
operator $ : M.^{A.) M^{A) is that there exists an 
operator U : B{A) B{A) such that: 



f{x)<^n{dx) 



Uf{x)^{dx) 



(5) 



for all / G B{A),p G M^{A). The operator U is called 
the operator conjugate to $. It can be shown [6] that for a 
continuous IFS the operator conjugate of $ is U, where 



(6) 



For an IFS, the concept of change of state and probability of 
the correlated process in each step can be extended to n > 1 
steps. For an i = (ii, i2, ■■■in) G 2," , we denote 

= F,„(F,„_,(...F,,x)...)) 

qi{x) = qii{x)qi2{Fii{x))^^^qin{Fi^^_^{Fi^_^{.^^Fa{x)))) 

Then the probability of the sequential event i under state x G 
A is qi{x) and as a result of such sequence, the state changes 
from X to Fi(x) in n steps. As an extension of (|6j, we can 
show 

= <lii^)fiFiix)). (7) 

In this paper we define for a given continuous IFS, and for 

a / G i?(A), 

Fix) ^ lim iW'f)ix). (8) 

n — >OG 

Now we state our result on IFS in the following Lemma 
which will be used in Section IV as the major application of 
IFS to the purpose of this paper 

Lemma 1: For a continuous IFS T = {A, Fi,qi)i=i^2...,K, 
and any function / G B{A), 



F{x) 



fdp, 



(9) 



where p = lim„^oo *I'"^x (if the limit exists), and 6x G 
A4^{A) is a distribution with all probability mass at a:. 
Proof: From (|5} we have 

J = J Hfdi'^'fi) = J {U'f)dp, 

where the first equality is by substituting fi with $/i in (|5} 
and the second equality by substituting / with W/. Therefore 
by repetition of (|5}, we have 



/d(<i>"M) = J {U''f)dp, (10) 

for all / G B{A),pe M^{A)^ This results in 

F{x) = lim / (W"/)d,5, = lim / /d($"4) - / fdfi, 



where the first equality is from the definition of F in (|8} and 
the last one is from (0). 
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From the above Lemma we infer that for an asymptotically 
stable continuous IFS, the function is a constant independent 
of X. Note that asymptotic stability ensures that there exists at 
least one /i satisfying for any v G A^^(A), which is true 
for V — 5x for any x. If there are more than one ^ e S-'^ , all 
of them has to satisfy 0. So in this case the Equality of (|9} 
independent of x is true for any /i G 5'^. 

We use the result of this section in the analysis of entropy 
measures of hidden Markov processes by specializing A to be 
the space of information state process and / to be variations 
of the entropy function. 

III. The Hidden Markov Process 

A hidden Markov process is a process related to an under- 
lying Markov process through a discrete memoryless channel, 
so it is defined (for finite alphabet cases) by the transition 
probability matrix P of the Markov process and the emission 
matrix T of the memoryless channel [7], [8]. In this paper 
the hidden Markov process is referred to by {Z„}J^_j^, 
Zn G Z and its underlying Markov process by {Sn}^^_^, 
Sn G S. The elements of matrices -P|5|x|5| and T\s\x\z\ 
the conditional probabilities. 



P[S, S'] = p{Sn+l = s'\Sn = S), 
T[s,z\ = p{Zn = z\Sn ^ S). 



(11) 



A pair of matrices P and T define a time invariant (but not 
necessarily stationary) hidden Markov process on the state set 
S and observation set Z by the following basic properties, for 
any n. 

• Al: Markovity, 



p(s„|s" ^) =_pp(s„|s„_i), 

where pp(s„|s„_i) = P[s„_i, s„]. 

• A2: Sufficient Statistics of State, 

p(s„|s„_l, z""^) = pp{Sn\Sn-l), 

where pp{-\.) is defined by P. 

• A3: Memoryless Observation, 

n 
i 

where pt{z\s) — T[s, z]. 
Property A3 implies: 



(12) 



(13) 



(14) 



(15) 



For a hidden Markov process we define two random vectors 
7r„ and p„ as functions of on the domains \7s,'^z, 

respectively. 



7r„(Z"-i)=p(5„|Z"-i). 



p„(Z"-i)=p(Z„|Z"-i). 



(16) 



(17) 



According to our notation, the random vector ttu has elements 
^„[fc], fc = 1,2,...,|5|, 

nn[k]^p{Sr, = k\Z^-^), 



and similarly for p„ . We obtain the relation between random 
vectors p„ and 7r„ 

p„[m](Z"-i) 

= Pr{Zn = m\Z''-^) 

= Ek Pr{Zn = m\Z^'-\ Sn - k)Pr{Sn - k\Z"-^) 



- Efc MZn = m\Sn = k)Pr{S„ - k\Z" 
^j:,T[k,m]7T4k]{Z^~^), 



which shows the matrix relation 



Pn 



n„T. 



(18) 



(19) 



More generally, we refer to ({n) G Vz as the projection of 
TT G V5 under the mapping T : V5 Vz, i-e: 



({tt) = ttT. 



We can write 



(20) 



(21) 



where the first equality is due to 7r„ being a function of Z"^^. 
Since the right hand side of J21l i is (only) a function of 7r„ 
(and it is a distribution on Z), the left hand side must be equal 
to p{Zn\TTn), i-c: wc havc shown 



p(Z„|7r„) = p(Z„|7r„,Z" ^) = C(7r«)- 



(22) 



This shows that 7r„ is a sufficient statistics for the observation 
process at time n. By a similar argument we have, 

=p(5„|Z"-i) ^ ^^(^^j^^^)^ (23) 

which shows that 7r„ is a sufficient statistics for the state 
process at time n. In other words the random vector 7r„ 
encapsulates all information about state at time n that can 
be obtained form all the past observations Z"^^. For this 
reason we call 7r„ the information-state at time n. A similar 
definition for the information state with the same property has 
been given for the more general model of partially observed 
Markov decision processes in [9]. 

Using Bayes' rule and the law of total probability, an 
iterative formula for the information state can be obtained as 
a function of z„, [9], [10], 



ViZn,Trn) 



where 



r]{z,n) = 



A ttD{z)P 



t:D{z)1 ' 



(24) 



(25) 



where D{z) is a diagonal matrix with k{z) 
1,2,.., 

Due to the sufficient statistic property of the information 
state, we can consider the information state process {7r„}^Q 
on Vs as the state process of an iterated function system 
on V5 with the hidden Markov process being its correlated 
process. This is because the hidden Markov process at time 
k is stochastically related to the information state process at 
that time by Pr{Zk = zj-Trfc = x) = C{x)[z] (from (|22}). On 
the other hand, Z^ — z result in the deterministic change of 
state from TTfc = x to nk+i — rj{z,x). Consequently, for a 
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(26) 



hidden Markov process there is a continuous iterated function 
systems defined by, for different values z ^ Z, 

Pzix) =r]{z,x), 
Qzix) =C{x)[z], 

where the equaUty J^zl^i^) = l,a; G V5 is satisfied 
due to ^(cc) G Wz. These functions are in fact conditional 
probabilities, F^(x) = p{Sk+i\Zk z,nk ^ x) and q^ix) = 
Pr{Zk = zjTTfc = a;) for any k. 

If the emission matrix T has zero entries, then function 
r]{z,x) could be indefinite for some {z,x). This happens for 
those X G V5 that the element z of vector xT is zero', i.e: the 
functions Fz{x) is only defined for x that qz{x) > 0. Hence 
for the general choice of matrix T we have a PIFS associated 
to the hidden Markov process. For this and other reason that 
will reveals later we assume that matrix T has non zero entries. 

For the continuous IFS related to the hidden Markov pro- 
cess, we can obtain the corresponding Feller operator $ and 
its conjugate operator U. The operator U maps any / G B{A) 
to Uf G B{A) where 

iUf)ix) ^Ez<lzix)f{Fz{x)) 

= J2zP'^(^k = z\TTk = x)f{p{Sk+l\Zk = Z,TTk^ x)) . 

(27) 

In general given tt^ — x, the probability of a specific n- 
sequence z = (zi, Z2, z„) for the HMP is 



Proof: The proof follows from [6, Theorem 8.1]. The 
IFS defined in [6, Theorem 8.1] by 



1=1 xiTii 



= V{hx)[j], 



lf{x) ^^xiTu = C,{x)[i\, 

is the same as the IFS defined by M6\ . It is shown in [6, 
Theorem 8.1] that under the conditions of this lemma is 
asymptotically hyperbolic, which then has to be asymptotically 
stable according to [6, Theorem 3.4]. ■ 
A Markov chain with primitive transition matrix P is geomet- 
rically ergodic and has a unique stationary distribution [7]. 

IV. Entropy Rate and Estimation Entropy 

The entropy of a random variable Z G -Z is a function of 
its distribution p{Z) G Vz, 

H{z) = h{p{z)) - -pW logp(^)- 



fe(a;)(7.,(F,,(x))...g,,.(F,„_,(F,„_,(...Fi(x)))), 
and this sequence changes the state to 

^k+n= Fz„{Fz„_,{...Fz,x)...)). 

Therefore we can write 

p{Su+n\Zl+''-^ 



(28) 



x) = FzAFz„_A-Fz,x)...)). 

(29) 

Comparing to Q, we infer for any / G B{A) and Vfc, 
il^"f)ix) 

k 



For a general process {^n}5^L-oo' entropy of any n- 
sequence is denoted by which is defined 

by the joint probabilities Pr(2'^+""' = z), for all z G Z''\ 
For a stationary process these joint probabilities are invariant 
with k. The entropy rate of the process is denoted by Hz and 
defined as 

Hz = lim (32) 

n—>-oc 11 

when the Hmit exists. Let 

a„ ^ H{Z^\Z'^-') = H{Z'S) - H{Z'^-^). 

We see that the entropy rate is the limit of Cesaro mean of 
the sequence of cr„, i.e: 



Ez Pr[Zt:^''-' = Zkfe = x)f{p{Sk+n\Zl+^-^ = Z,^fe = x), 



H, 



(30) 

For example, for entropy function h, 

\s\ 

Ma;) =II"^Wlog(^W)' a;GV5, (31) 

i=l 

we have for any k, 

{U^h){x) = H{Sk+n\Z^,+"-\TTk = X). 

The IFS corresponding to a HMP under a wide range of the 
parameters of the process is shown to be asymptotically stable. 

Definition 2: A stochastic Matrix P is primitive if there 
exists an n such that {P^)i,j > for all 

Lemma 2: For a primitive matrix P and an emission matrix 
T with strictly positive entries, the IFS defined according to 
(I26> is asymptotically stable. 

'e.g: if Ti 1 = T2J = 0, then for all tt that have zero compon ents on 
the third elements onward, both the nominator and denominators of \25\ for 
2 = 1 will be zero, and for those tt's the first component of ttT is zero. 



1 " 
lim — > (Ji 

n — >oo fi ^ — ^ 

i=l 



(33) 



We know that if the sequence of cr„ converges, then the 
sequence of its Cesaro mean also converges to the same limit 
[2, Theorem 4.2.3]. However the opposite is not necessarily 
true. Therefore, the entropy rate is equal to 



Hz = lim i/(Z„|Zo"-i), 



(34) 



when this limit exists, but the non-existence of this limit 
doesn't mean that the entropy rate doesn't exist. On the other 
hand, the sequence of (t„ converges faster than the sequence 
in ( I33> to its limit. Therefore the convergence rate of ( I34t is 
faster than (I32> . This fact was first pointed out in [11]. 

One sufficient condition for the existence of the limit of (t„ 
is the stationarity of the process. For a stationary process 

an = i/(Z„+i|Zn > H{Z^a+i\Z'o'-') = a„+i > 0, (35) 

which shows that cr„ must have a limit. Therefore for a 
stationary process we can write entropy rate as ( I34> . For a 



5 



Stationary Markov process with transition matrix P the entropy 
rate is 

Hz = lim ff(Z„|Z„_i) = H{Zi\Z^) = Va;W;j(P«), 

n — 'oo ^—^ 

(36) 

where x £ is the stationary distribution of the Markov 
process, i.e: the solution of xP=x. Of special interest to this 
paper is the entropy rate of the hidden Markov process. 

We can extend the concept of entropy rate to a pair of 
correlated processes. Assume we have a jointly correlated 
processes {Z,i]'^=-oo {^n}'^=^oo where we observe the 
first process and based on our observation estimate the state 
of the other process. The uncertainty in the estimation of 5„ 
upon past observations Zq~^ is H{Sn\Z^^^^). The limit of 
this sequence which inversely measures the observability of 
the hidden process is of practical and theoretical interests. We 
call this limit Estimation Entropy, 



Hs/z = lim 



(37) 



when the limit exists. Similar to entropy rate, we can consider 
the limit of Cesaro mean of the sequence /3„ — 
(i.e: lim l/ny]"^, (3i ) as the estimation entropy, which 
gives a more relaxed condition on its existence, but it will 
have a much slower convergence rate. However, if both limits 
exist, then they will be equal. If the two processes {^n}5^_oo 
and {Sn}'^=^oo jointly stationary, then /3„ is decreasing 
and non-negative (same as ( I35» , thus the limit in (|^} exists. 
We see that for a wide range of non-stationary processes also 
the limits in (I34> and ( I37> exist. 

Practical application of estimation entropy is for example in 
sensor scheduling for observation of a Markov process [12]. 
The aim of such a scheduler is to find a policy for selection 
of sensors based on information-state which minimizes the 
estimation entropy, thus achieving the maximum observability 
for the Markov process. This entropy measure could also be 
related to the error probability in channel coding. The more 
the estimation entropy, the more uncertainty per symbol in 
the decoding process of the received signal, thus higher error 
probability. The estimation entropy can be viewed as a bench- 
mark for indicating how well an estimator is working. It is the 
limit of minimum uncertainty that an estimator can achieve for 
estimating the current value of the unobserved process under 
the knowledge of enough history of observations. We consider 
HMP as a joint process and analyze its estimation entropy. 

For a stationary hidden Markov process the entropy rate Hz 
and estimation entropy Hs/z the limiting expectations 



Hz = lim E[h{p^)], 

n^oo 

Hs/z = lim E[hinn)]. 



(38) 



However since 7r„ and p„ are functions of joint distributions 
of random variables Z^^^ these expectations are not directly 
computable. We use the IFS for a hidden Markov process to 
gain insight into these entropy measures in a more general 
setting without the stationarity assumption. 

Adapting Equation Q with special functions F^^{x) and 
qz{x) in |26l we obtain the Feller operator $ for the IFS 



corresponding to a hidden Markov process. 

j lB{il{z,x))ax)[zMdx). (39) 



To analyze the entropy measures Hz and Hs/z^ we define 
two intermediate functions 



Hz{x) = lim F(Z„|Zo"-\^o = 
Hs/z{x) = lim i/(5„|Zo"-\^o 



(40) 



In comparison to (I34> and ( I37> . these functions are the 
corresponding per symbol entropies when it is conditioned on 
a specific prior distribution of state at time n = 0. We now use 
Lemma [2 to obtain an integral expressions for these limiting 
entropies. 

Lemma 3: For a hidden Markov process 



Hz{x) = / {hioQdfi, 

Vs 

Hs/z{x) = / h2d^i, 

Vs 



(41) 



where /i = lim„^oo and hi : R+ and ft,2 : 

V5 M+ are entropy functions. 

Proof: From definition of conditional entropy we write, 

H{Z,,\Z^-\tt^=x) = 

E.PriZj;-^ = z\TTk = x)hi{p{Zn\ZS-' - Z,^o = x)). 

(42) 

Now since (as in ( I18> . using p(z„|s„, z"~^, ttq) =p(z„|s„)). 



piZnlzi; 



-.TTq^x) = Cip{Sn\ZQ 1 = Z, TTq = X)), 



(43) 



Equation J42> can be written as 

HiZn\Z'o'~\no^x)^ 

- zl^fc - x)hi o C(p(5n|^r' = = x)). 

(44) 

Similarly from definition of conditional entropy, we can write 

i/(5„|Zy-\7ro = .t) = 

Ez^K^r' = zkfe = x)h2{p{Sn\Z'^'-' = z,7ro = x)). 

(45) 

Comparing Equations J44t with ( I30t . we have 

Hzix) = lim {U''ihio()){x). (46) 

n — ^00 

Similarly by ( l45t . 

Hs/zix) = lim iU''h2){x). (47) 

n — yoo 

Now considering Equation (|8} and applying Lemma we 
obtain (|4T)- ■ 

Lemmas |2l and |3] result in integral expressions for entropy 
rate and estimation entropy. 

Theorem 1: For a hidden Markov process with primitive 
matrix P and the emission matrix T with strictly positive 
entries, 

Hz ^ S {hi o C)d^i, 
Hs/z = J h2d^i, 
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where /i is any attractive and invariant measure of operator $, 
and hi, h2 are the entropy functions on Wz, V5, respectively. 

Proof: From Lemma |2] under the condition of this 
Theorem, the continuous IFS corresponding to the HMP is 
asymptotically stable. As it is discussed after Lemm^ in this 
case the functions Hz{x) and Hg/z{x) (in ( I46> and ( I47h 
are independent of x and the equalities of ( I41> are satisfied 
for any attractive measure (which exists and it is also an 
invariant measure) of $. The independency of x for Hz{x) 
and Hs/z{x) in ( I40> results in the equalities in ( I48> for /fz 
and Hs/z- Note that for a set of random variables X,Y,Z 
if X = x) is invariant with x, then H{Y\Z) = 

|Z, X) = H{Y\Z, X ^ x). Moreover from the existence 
of limit of (T„ (defined before) this limit is equal to Hz- ■ 
The first equality in the above theorem has been previously 
obtained by a different approach in [13]. However in [13], the 
measure 11 is restricted to be ^ = lim ^"Sx', where x* is 

n — >oo 

the stationary distribution of the underlying Markov process 
defined by P, 

x*P^x*. (49) 

The integral expression for Hz in Theoremfjis also the same 
as the expression in [6, Proposition 8.1] for 6 = P. For this 
case the integral expression is shown to be equal to both of 
the following two entropy measures 

H{x*) 4 Jim i Eze^" Q^i^*) logfe(a;*)), 
nin) = lim jiJ2^^z,^ J q^ix)fiidx).\og{J q^{x)n{dx)), 

(50) 

where fi is the attractive and invariant measure of $ for the IFS 
defined by i26\ . Considering qz{x) = p{Zq~^ = z\tto = x) 
for HMP, (c.f. (I28» . the two equalities match with Lemma 
|3] and Theorem However, the analysis in [6] is based on 
a general and complex view to dynamical systems, where 
the dynamics of system is represented by a Markov operator 
and the measurement process is separately represented by a 
Markov pair, and this Markov pair corresponds to a PIFS. 

The integral expression for Hz is also equivalent to the 
original Blackwell's formulation [1] by a change of variable 
X to xP. This is because the expression in [1] is derived based 
on a„_i — instead of 7r„ = q;„_iP in il6\ 

(cf. JOH . The measure of integral also corresponds to this 
change of variable. Note that the measure /i in ( I48> satisfies 
(due to its invariant property) 



f-\b) 



{xT)[z]^i{dx), (51) 



(cf. ( I39» which is the same as the integral equation for 
the measure in [1] if we change the integrand of ( 15 H to 
rz{x) — (xPT)\z\ and instead of Fz{x) use the function 
fz{x) = xPD{z)/rz{x) (derived from \25\ by tt = aP, 
satisfying an+i = fz{an))- 

V. A Numerical Algorithm 

Here we obtain a numerical method for computing entropy 
rate and estimation entropy based on Lemma|3land the fact that 
with the condition of Theorem^ J41> is independent of x. The 
computational complexity of this method grows exponentially 



with the iterations, but numerical examples show a very fast 
convergence. In [14] it is shown that applying this method for 
computation of entropy rate yields the same capacity results 
for symmetric Markov channels similar to previous results. 
We write J41> as 



HzM 



lim J {hi o (^)d^„ 



(52) 



where /i„ = Considering /In : V5 ^ 7^ as the 

probability density function corresponding to the probability 
measure /i„, from Q and ( I39t we have the following recursive 
formula 

V-n+iiT^n+i) j 5(7r„+i-7/(z,7r„))C(7r„)[z]7i„(7r„)d7r„. 



Vs 



(53) 

Corresponding to the initial probability measure 5^, we have 
the initial density function 'JIq{x) = 5{x ~ v). By 7*0 being a 
probability mass function. Equation ( I53t yields a probabiUty 
mass function /!„ for any n. For example 'pi{ ) is 



Aii(7ri) 



E 



5{-Kl - T^{z,v))C{v)[z] 



which is a \Z\ point probability mass function. By induction 
it can be shown that the distribution for ™y is a 

probability mass function over a finite set t/„ which consists of 
Z|" points of V5, Un = {m e Vs ■ u — r]{z,v),z G Z, u G 
Un-i}, \Un\ = |^r\ Uq — {i'}. The probability distribution 
over Un is = /i„-i(w)C(w)[z] for u = r]{z,v), v G 

Un~i- Therefore for every v G Un~i, \Z\ points will be 
generated in J7„ that corresponds to i]{z, v) for different z, and 
the probability of each of those points will be fin-i{v){vT)[z]. 

Starting from Uq = {ly} for some ly G V5, by the 
above method we can iteratively generate the sets C/„ and the 
probability distribution /i„(.) over these sets. The integrals 
in ( I52t can now be written as summation over Un, therefore 
the entropy rate and estimation entropy are the limit of the 
following sequences 



H 
where 



H'z 

s/z 



iln{Ui)hi{uiT), 
ij.n{ui)h2{Ui) , 



U i G Un 
Ui G Un, 



(54) 



Figure n shows the convergence of the proposed method to 
the entropy rate and estimation entropy for various starting 
points V for an example hidden Markov process. In this 
example S = Z = 2, 3}, and 
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Although the result of Section IV ensures convergence of 
algorithm for any starting distribution this figure and other 
numerical examples show faster convergence for ly — x* (the 



7 




solution of i49i ). Without the condition of Theorem [3 the 
convergence could be to different values for various ly. Among 
various examples of HMP, the convergence will be slower 
where the entropy rate of the underlying Markov process 
with transition probability matrix P (Hz in i36\ ) is very low 
relative to logj |<S| (in the above example it is 0.678b relative 
to 2b) or the rows of T have high entropy. 

The sequence of H^^, as the right hand side of ( I52t for 
finite n > 0, is in fact = i/(Z„|Zj"\ ttq = ly). If we 
assume (as in [2]) that the process Z„ starts at time zero, i.e: 
one sided stationary process, then ttq means the distribution 
of state without any observation which if we further assume 
that it is the stationary distribution of state process, i.e: x* in 
(09}, then both of the processes {^„}5f=-oo and {5'„}^_^ 
are stationary. So for f = x*, = H{Zn\ZQ^^) = (t„, 
and similarly TJ^'/^ = H{Sn\Z^^^) = /?„, and the sequences 
of cr„ and /3„ converge monotonically from above to their 
limits. Therefore, and H^^^ as defined in (I54> for ly — x* 
are always monotonically decreasing sequence of n. Figure [J 
exemplifies this fact. 

VI. Conclusion 

HMP is a process described by its relation to a Markov 
state process which has stochastic transition to the next state 
independent of the current realization of the process. In this 
paper we showed that HMP can be better described and more 
rigorously analyzed by iterated function systems whose state 
transitions are deterministically related to the process. In both 
descriptions the state is hidden and the process at any time is 
stochastically related to the state at that time. 

In this paper we also introduced the concept of estimation 
entropy for a pair of joint processes which has practical 
applications. The entropy rate for a process, like HMP, which 
is correlated to another process can be viewed as the self 
estimation entropy. Both entropy rate and estimation entropy 
for the hidden Markov process can be analyzed using the 
iterated function system description of the process. This 
analysis results in integral expressions for these dynamical 
entropies. The integral expressions are based on an attractive 
and invariant measure of the Markov operator induced by 



Fig. 1 . The convergence of the proposed algorithm to the entropy rate (left) 
and estimation entropy of the example hidden Markov process for various u. 



the iterated function system. These integrals can be evaluated 
numerically as the limit of special numerical sequences. 
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